A method of constructing syllable level Tibetan text classification corpus

نویسندگان

چکیده

Corpus serves as an indispensable ingredient for statistical NLP research and real-world applications, therefore corpus construction method has a direct impact on various downstream tasks. This paper proposes to construct Tibetan text classification based syllable-level processing technique which we refer TC_TCCNL. Empirical evidence indicates that the algorithm is able produce promising performance, may lay starting point in future.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Tibetan Syllable-Based Functional Chunk Boundary Identification

Tibetan syntactic functional chunk parsing is aimed at identifying syntactic constituents of Tibetan sentences. In this paper, based on the Tibetan syntactic functional chunk description system, we propose a method which puts syllables in groups instead of word segmentation and tagging and use the Conditional Random Fields (CRFs) to identify the functional chunk boundary of a sentence. Accordin...

متن کامل

Building Large Scale Text Corpus for Tibetan Natural Language Processing by Extracting Text from Web Pages

In this paper, we propose an approach to build a large scale text corpus for Tibetan natural language processing. We find the distribution of Tibetan web pages on the internet with a crawler which can identify whether or not a web page contains Tibetan text. Three biggest web sites are selected, and topic pages are selected with a rule based method by checking the url. The layout structures of ...

متن کامل

An effective procedure for constructing a hierarchical text classification system

In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: MATEC web of conferences

سال: 2021

ISSN: ['2261-236X', '2274-7214']

DOI: https://doi.org/10.1051/matecconf/202133606013